Goto

Collaborating Authors

 non-convex function





Automatic and Harmless Regularization with Constrained and Lexicographic Optimization: A Dynamic Barrier Approach

Neural Information Processing Systems

Many machine learning tasks have to make a trade-off between two loss functions, typically the main data-fitness loss and an auxiliary loss. The most widely used approach is to optimize the linear combination of the objectives, which, however, requires manual tuning of the combination coefficient and is theoretically unsuitable for non-convex functions. In this work, we consider constrained optimization as a more principled approach for trading off two losses, with a special emphasis on lexicographic optimization, a degenerated limit of constrained optimization which optimizes a secondary loss inside the optimal set of the main loss. We propose a dynamic barrier gradient descent algorithm which provides a unified solution of both constrained and lexicographic optimization. We establish the convergence of the method for general non-convex functions.



Reviews: Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Neural Information Processing Systems

I think the first part of the paper has very good original contributions with correct and nicely-written proofs in the appendix. However, I have the following questions regarding the parts of the paper starting at Section 3. Sorry if these are redundant questions with obvious answers that I missed. The RGD framework is mentioned for both convex and non-convex functions (Lemma 4 doesn't require f to be convex). However, the examples provided are all convex functions, and the focus also seems to be quite heavily on convex functions (because none of the papers on nonconvex optimization are compared with). Do the authors have (1) theoretical results and comparisons with existing work and/or (2)experiments, for non-convex functions?


Reviews: Stochastic Mirror Descent in Variationally Coherent Optimization Problems

Neural Information Processing Systems

Adding experimental results as they promised to R3 would be valuable as well 3) As R2 pointed out, the intuition behind the analysis is not always clear. Given the rather convincing answers in the rebuttal, I think the authors can easily improve this aspect in the revised version.


Automatic and Harmless Regularization with Constrained and Lexicographic Optimization: A Dynamic Barrier Approach

Neural Information Processing Systems

Many machine learning tasks have to make a trade-off between two loss functions, typically the main data-fitness loss and an auxiliary loss. The most widely used approach is to optimize the linear combination of the objectives, which, however, requires manual tuning of the combination coefficient and is theoretically unsuitable for non-convex functions. In this work, we consider constrained optimization as a more principled approach for trading off two losses, with a special emphasis on lexicographic optimization, a degenerated limit of constrained optimization which optimizes a secondary loss inside the optimal set of the main loss. We propose a dynamic barrier gradient descent algorithm which provides a unified solution of both constrained and lexicographic optimization. We establish the convergence of the method for general non-convex functions.


New logarithmic step size for stochastic gradient descent

Shamaee, M. Soheil, Hafshejani, S. Fathi, Saeidian, Z.

arXiv.org Artificial Intelligence

Stochastic gradient descent (SGD), which dates back to the work by Robbins and Monro Robbins and Monro [1951a] is widely observed in training modern Deep Neural Networks (DNNs), which are widely used to achieve state-of-the-art results in multiple problem domains like image classification problems Krizhevsky et al. [2017, 2009], object detection Redmon and Farhadi [2017], and classification automatic machine translation Zhang et al. [2015]. The value of the step size (or learning rate) is crucial for the convergence rate of SGD. Selecting an appropriate step size value in each iteration ensures that SGD iterations converge to an optimal solution. If the step size value is too large, it may prevent SGD iterations from reaching the optimal point. Conversely, excessively small step size values can lead to slow convergence or mistakenly identify a local minimum as the optimal solution Mishra and Sarawadekar [2019]. To address these challenges, various schemes have been proposed. One popular approach is the Armijo line search method, initially introduced for SGD by Vaswani et al. Vaswani et al. [2019], which provides theoretical results for strong-convex, convex, and non-convex objective functions.


A Guide to Metaheuristic Optimization for Machine Learning Models in Python

#artificialintelligence

Mathematical optimization is the process of finding the best set of inputs that maximizes (or minimizes) the output of a function. In the field of optimization, the function being optimized is called the objective function. A wide range of out-of-the-box tools exist for solving optimization problems that only work with well-behaved functions, also called convex functions. Well-behaved functions contain a single optimum, whether it is a maximum or a minimum value. Here a function can be thought of as a surface with a single valley (minimum) and/or hill (maximum).